Skip to content

Conversation

emersodb
Copy link
Collaborator

PR Type

Feature

Short Description

Clickup Ticket(s): https://app.clickup.com/t/868fge8mt

Adding in two additional quality metrics from SynthEval to the library. These are the penultimate quality metrics to be added (just need an F1 measure in the next ticket to close off the quality measures).

NOTE: The SynthEval implementation of the Hellinger distance for numerical columns was flawed. So I brought the implementation in-house and fixed the issue.

Tests Added

A number of tests have been created to make sure the measures work as expected and produce the correct values across a number of situations.

"D104", # Ignore package level docstrings requirement
"D205", # 1 blank line required between summary line and description
"D212", # Multi-line docstring summary should start at the first line
"D301", # r-strings for docstrings with backslashes
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be brought in if we're going to have latex in our docstrings.

@emersodb emersodb marked this pull request as ready for review September 17, 2025 13:35
@emersodb emersodb changed the base branch from main to dbe/fixing_mypy September 24, 2025 15:22
Base automatically changed from dbe/fixing_mypy to main September 24, 2025 15:57
Copy link
Collaborator

@bzamanlooy bzamanlooy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just has a few questions that I added in the review :)

Copy link
Collaborator

@fatemetkl fatemetkl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thorough tests and very easy-to-follow documentation.
I’ve just added a few minor comments.

@emersodb emersodb merged commit 91150fd into main Sep 30, 2025
5 checks passed
@emersodb emersodb deleted the dbe/add_hellinger_pmse branch September 30, 2025 13:12
bzamanlooy pushed a commit that referenced this pull request Oct 10, 2025
04d817f Trainer: Changing all literals to enums (#48)
33f9306 Adding ignore for pip 25.2 vulnerability, removing stale ones (#50)
c8d30ca Remove mkdocs build dir, add to gitignore (#49)
c01121c End-to-end Evaluation Script Example (#45)
d231a4a Add Nearest Neighbor Distance Ratio and Epsilon Identifiability Privacy Metrics (#42)
53e423b Adding Mean F1 Score Difference and Hitting Rate Metrics (#39)
91150fd Adding in Hellinger and pMSE metrics (#38)
746644e Tightening Ruff Configuration (#46)
580d55f Adding data_split_ratios to both the diffusion config and the classifier config (#47)
72863be Refactoring core.logger into common.logger and removing it (#41)
bef53bf Train code split, Part 4: moving some of the model.py code into dataset.py (#40)
80b0154 Upgrading pip to latest version to solve security issue (#44)
83beba6 New mypy flow and fixes to typing issues that were discovered (#43)
7e77f37 Train code split, Part 3: moving some of the model.py code into sampler.py (#9)

git-subtree-dir: deps/midst-toolkit
git-subtree-split: 04d817f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants